Uploaded image for project: 'Subversion'
  1. Subversion
  2. SVN-2286

Identical files should share storage space in repository

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • trunk
    • 1.6.0
    • libsvn_fs
    • None

    Description

      See link for discussion.
      
      When using branches it often happens that identical changes are done to copied
      files; this results in wasted storage space.
      
      Using eg. the MD5-hash as an index it should be possible to find such duplicates
      and, instead of storing a new delta or even fulltext, just saving the other
      "inode" in the repository (simplest case is [filename,revision], better some
      internal pointer for speed reasons).
      
      Con: FSFS cannot be append-only; the indizes have to be written and re-written.
      
      Furthermore I'd like that to be more a cache, so that it can be generated,
      deleted and regenerated at any time (at a very high speed, as every file has a
      MD5 archived).
      
      For FSFS I'd suggest making a new directory, which uses 2 indirection layers
      down the hierarchy.
      Eg. for a file with MD5 of 8a04f87ad04f4a1d3c7e6ca12e07290d 
      
      repository/
        dav/
        ...
        db/
          revs/
          revprops/
          transactions/
          md5index/
            8a/
              04/
                f87a.index
      
      If this index has more than say 256 entries (which should be sorted in the file),
      it would be possible to split the file into new 16 parts.
      
      
      I believe that could save a lot of space, especially for scenarios with many
      branches.
      

      http://marc.theaimsgroup.com/?l=subversion-dev&m=111319801911398&w=2

      Original issue reported by pmarek

      Attachments

        Activity

          People

            Unassigned Unassigned
            subversion-importer Subversion Importer
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: